首页> 外文OA文献 >A novel parallel algorithm for frequent itemsets mining in massive small files datasets
【2h】

A novel parallel algorithm for frequent itemsets mining in massive small files datasets

机译:大规模小文件数据集中频繁项集挖掘的新型并行算法

摘要

In big data analysis, frequent itemsets mining plays a key role in mining associations, correlations and causality. Since some traditional frequent itemsets mining algorithms are unable to handle massive small files datasets effectively, such as high memory cost, high I/O overhead, and low computing performance, we propose a novel parallel frequent itemsets mining algorithm based on the FP-Growth algorithm and discuss its applications in this paper. First, we introduce a small files processing strategy for massive small files datasets to compensate defects of low read-write speed and low processing efficiency in Hadoop. Moreover, we use MapReduce to redesign the FP-Growth algorithm for implementing parallel computing, thereby improving the overall performance of frequent itemsets mining. Finally, we apply the proposed algorithm to the association analysis of the data from the national college entrance examination and admission of China. The experimental results show that the proposed algorithm is feasible and valid for a good speedup and a higher mining efficiency, and can meet the actual requirements of frequent itemsets mining for massive small files datasets. © 2014 ISSN 2185-2766.
机译:在大数据分析中,频繁项集挖掘在挖掘关联,相关性和因果关系中发挥着关键作用。由于某些传统的频繁项集挖掘算法无法有效处理海量小文件数据集,例如内存成本高,I / O开销大,计算性能低等,因此我们提出了一种基于FP-Growth算法的新型并行频繁项集挖掘算法并在本文中讨论其应用。首先,我们针对大型小文件数据集引入了一种小文件处理策略,以弥补Hadoop中读写速度低和处理效率低的缺陷。此外,我们使用MapReduce重新设计FP-Growth算法以实现并行计算,从而提高了频繁项集挖掘的整体性能。最后,将提出的算法应用于全国高考招生数据的关联分析。实验结果表明,该算法可行,有效,具有良好的加速效果和较高的挖掘效率,能够满足海量小文件数据集频繁项集挖掘的实际需求。 ©2014 ISSN 2185-2766。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号